The Meta-pi Network: Connectionist Rapid Adaptation for High-performance Multi-speaker Phoneme Recognition
نویسندگان
چکیده
We present a multi-network Time-Delay Neural Network (TDNN)based connectionist architecture that allows us to perform multispeaker phoneme discrimination (/b,d,gh at the speaker-dependent recognition rate of 98.4%. The overall network gates the phonemic decisions of modules trained on individual speakers to form its over-all classification decision. By dynamically adapting to the input speech and focusing on a combination of speaker-specific modules, the network outperforms a single TDNN trained on the speech of all six speakers (95.9%). To train this network we have developed a new form of multiplicative connection that we call the “Meta-Pi” connection. We illustrate how the Meta-Pi paradigm implements a dynamically adaptive Bayesian MAP classifier. It leams without supervision to recognize the speech of one particular speaker (99.8%) using a dynamic combination of intemal models of other speakers exclusively. The Meta-Pi model is a viable basis for a connectionist speech recognition system that can rapidly adapt to new speakers and vnrying speaker dialects.
منابع مشابه
Connectionist Architectures for Multi-Speaker Phoneme Recognition
We present a number of Time-Delay Neural Network (TDNN) based architectures for multi-speaker phoneme recognition (/b,d,g/ task). We use speech of two females and four males to compare the performance of the various architectures against a baseline recognition rate of 95.9% for a single IDNN on the six-speaker /b,d,g/ task. This series of modular designs leads to a highly modular multi-network ...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملImproving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM
Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملConnectionist Transformation Network Features for Speaker Recognition
Alternative approaches to conventional short-term cepstral modelling of speaker characteristics have been proposed and successfully incorporated to current state-of-the art systems for speaker recognition. Particularly, the use of adaptation transforms employed in speech recognition systems as features for speaker recognition is one of the most appealing recent proposals. In this paper, we also...
متن کامل